Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gossip in drone and wallet to identify leader ports #1002

Merged
merged 3 commits into from
Aug 23, 2018

Conversation

CriesofCarrots
Copy link
Contributor

@CriesofCarrots CriesofCarrots commented Aug 17, 2018

Fixes #934 and unblocks #920

@CriesofCarrots CriesofCarrots requested a review from pgarg66 August 17, 2018 22:44
@CriesofCarrots
Copy link
Contributor Author

@pgarg66, this should work with your pr #920
Currently, this code has drone/wallet still reading leader.json, and then pulling the leader's ncp address from there.

@garious
Copy link
Contributor

garious commented Aug 19, 2018

That's too much code to copy-paste. A utility function in thin_client.rs, perhaps?

Also, is this really what we want? On Google Cloud, will this return the public addresses a client would need or the private ones the network is using to minimize costs?

@aeyakovenko
Copy link
Member

aeyakovenko commented Aug 19, 2018

the public/private IPs are complicated. Maybe the network should advertise many IPs, in order of preference, but a single set of ports for all the services. The clients can find whatever works for them. But the above should probably be addressed in a different PR/issue.

src/bin/drone.rs Outdated
crdt.write().unwrap().insert(&leader_entry_point);

// Block until leader's correct contact info is received
while crdt.read().unwrap().leader_data().is_none() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a sleep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Between the crdt write and read? How long and why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will spin the thread without yielding to the rest of the system to actually process anything to get the leader_data. if you sleep, that tells the OS that it can do other useful things. 100ms is probably good enough.

maybe pull this out into a function that polls for X seconds and returns an error if it fails. otherwise we may end up with this stuck in a misconfigured test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent push implements a configurable timeout for both wallet and drone gossip-poll. (Timeout can also be none, if blocking is desired.) I plugged in some starting values in the bash scripts that are reasonable for my local machine. @aeyakovenko , do you have a feel for how long is reasonable for tests/automation? Maybe a question for @mvines ?
Presumably the wallet timeout can be short, since we expect a full node to be up, but I'm not clear on how long we would expect between drone and leader boot.

}
}

exit.store(true, Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CriesofCarrots CriesofCarrots force-pushed the drone-wallet-gossip branch 3 times, most recently from 503fee3 to 20a0abd Compare August 21, 2018 04:06
@garious
Copy link
Contributor

garious commented Aug 21, 2018

Hey @rob-solana, are you keeping an eye on the CLI options from all our various apps? This PR adds -w for "wait" where the default is to block. Seems a little funky to me. Thoughts?

@rob-solana
Copy link
Contributor

If I read this correctly, the timeout is in the user's hands, i.e. if no -w argument is presented, the code keeps trying forever. That sounds correct to me.

src/bin/drone.rs Outdated
Arg::with_name("wait")
.short("w")
.long("wait")
.value_name("NUMBER")
Copy link
Contributor

@rob-solana rob-solana Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clap is pretty awesome... I recommend you pass something that looks like help to value_name().

when you ask for help with this code you see something like

--wait NUMBER time to wait for leader info from gossip

What units? Why does this thing need "leader info"?

Were I the user, I'd prefer something that conveyed the units and better captured semantics, e.g.:

--timeout SECS wait at most SECS seconds to get necessary gossip from the network

Copy link
Contributor

@rob-solana rob-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some things ok, I think. some ergonomics polish might be worth the time

@garious
Copy link
Contributor

garious commented Aug 22, 2018

I agree --timeout SECS would convey more meaning. And I'd be okay if there was no short form for this option (-w/-t), since it may only be used in automation. Using the long form would make those scripts more readable. @CriesofCarrots can you make that change?

@CriesofCarrots
Copy link
Contributor Author

Great suggestions!

@garious garious assigned aeyakovenko and unassigned rob-solana Aug 22, 2018
@garious
Copy link
Contributor

garious commented Aug 22, 2018

@aeyakovenko (or @mvines) should this PR be blocked on that public/private IP issue?

@garious
Copy link
Contributor

garious commented Aug 23, 2018

@CriesofCarrots, crickets. Merge it at your leisure.

Tyera Eulberg added 3 commits August 23, 2018 12:30
@CriesofCarrots CriesofCarrots added the automerge Merge this Pull Request automatically once CI passes label Aug 23, 2018
@solana-grimes
Copy link
Contributor

💔 Unable to automerge due to CI failure

@solana-grimes solana-grimes removed the automerge Merge this Pull Request automatically once CI passes label Aug 23, 2018
@CriesofCarrots CriesofCarrots merged commit d4c4121 into solana-labs:master Aug 23, 2018
@CriesofCarrots CriesofCarrots deleted the drone-wallet-gossip branch August 23, 2018 23:13
vkomenda pushed a commit to vkomenda/solana that referenced this pull request Aug 29, 2021
)

* Bump eslint from 7.13.0 to 7.17.0 in /token-lending/js

Bumps [eslint](https://github.com/eslint/eslint) from 7.13.0 to 7.17.0.
- [Release notes](https://github.com/eslint/eslint/releases)
- [Changelog](https://github.com/eslint/eslint/blob/master/CHANGELOG.md)
- [Commits](eslint/eslint@v7.13.0...v7.17.0)

Signed-off-by: dependabot[bot] <[email protected]>

* fix: add mkdirp dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Justin Starry <[email protected]>
yihau pushed a commit to yihau/solana that referenced this pull request May 17, 2024
…olana-labs#1002)

* replay: only vote on blocks with >= 32 data shreds in last fec set

* pr feedback: pub(crate), inspect_err

* pr feedback: error variants, collapse function, dedup

* pr feedback: remove set_last_in_slot, rework test

* pr feedback: add metric, perform check regardless of ff

* pr feedback: mark block as dead rather than duplicate

* pr feedback: self.meta, const_assert, no collect

* pr feedback: cfg(test) assertion, remove expect and collect, error fmt

* Keep the collect to preserve error

* pr feedback: do not hold bank_forks lock for mark_dead_slot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants